Selecting a Subset of Queries for Acquisition of Further Relevance Judgements

نویسندگان

  • Mehdi Hosseini
  • Ingemar J. Cox
  • Natasa Milic-Frayling
  • Vishwa Vinay
  • Trevor Sweeting
چکیده

Assessing the relative performance of search systems requires the use of a test collection with a pre-defined set of queries and corresponding relevance assessments. The state-ofthe-art process of constructing test collections involves using a large number of queries and selecting a set of documents, submitted by a group of participating systems, to be judged per query. However, the initial set of judgments may be insufficient to reliably evaluate the performance of future as yet unseen systems. In this paper, we propose a method that expands the set of relevance judgments as new systems are being evaluated. We assume that there is a limited budget to build additional relevance judgements. From the documents retrieved by the new systems we create a pool of unjudged documents. Rather than uniformly distributing the budget across all queries, we first select a subset of queries that are effective in evaluating systems and then uniformly allocate the budget only across these queries. Experimental results on TREC 2004 Robust track test collection demonstrate the superiority of this budget allocation strategy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Creating a Test Collection for Citation-based IR Experiments

We present an approach to building a test collection of research papers. The approach is based on the Cranfield 2 tests but uses as its vehicle a current conference; research questions and relevance judgements of all cited papers are elicited from conference authors. The resultant test collection is different from TREC’s in that it comprises scientific articles rather than newspaper text and, t...

متن کامل

A New Framework for Distributed Multivariate Feature Selection

Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods. In this paper, we suggest a distributed version of the mRMR featu...

متن کامل

Modelling long-term relevance feedback

We propose a general relevance model, called the User Relevance Model, that formalises the decisions taken by a user during a query with respect to relevance judgements. Starting from a keyword-based query, the user is allowed to refine the document search using relevance feedback iterations where some subset of the result set is marked as relevant, and another subset is marked as non-relevant....

متن کامل

A Probabilistic Framework for Vague Queries and Imprecise Information in Databases

A probabilistic learning model for vague queries and missing or imprecise information in databases is described. Instead of retrieving only a set of answers, our approach yields a ranking of objects from the database in response to a query. By using relevance judgements from the user about the objects retrieved, the ranking for the actual query as well as the overall retrieval quality of the sy...

متن کامل

Relevance Judgements for Assessing Recall

| Recall and Precision have become the principle measures of the e ectiveness of information retrieval systems. Inherent in these measures of performance is the idea of a relevant document. Although recall and precision are easily and unambiguously de ned, selecting the documents relevant to a query has long been recognised as problematic. To compare performance of di erent systems, standard co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011